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Abstract 



A popular line of research in evolutionary biology is the use of time-calibrated phyloge- 
nies for the inference of diversification processes. This requires computing the likelihood of 
a given ultrametric tree as the reconstructed tree produced by a given model of diversifica- 
tion. Etienne & Rosindell (2012) proposed a lineage-based model of diversification, called 
protracted speciation, where species remain incipient during a random duration before turn- 
ing good species, and showed that this can explain the slowdown in lineage accumulation 
observed in real phylogenies. However, they were unable to provide a general likelihood 
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formula. Here, we present a likelihood formula for protracted speciation models, where 
rates at which species turn good or become extinct can depend both on their age and on 
time. Our only restrictive assumption is that speciation rate does not depend on species 
status. 

Our likelihood formula utilizes a new technique, based on the contour of the phylogenetic 
tree and first developed in Lambert (2010). We consider the reconstructed trees spanned 
by all extant species, by all good extant species, or by all representative species, which are 
either good extant species or incipient species representative of some good extinct species. 
Specifically, we prove that each of these trees is a coalescent point process, that is, a planar, 
ultrametric tree where the coalescence times between two consecutive tips are independent, 
identically distributed random variables. We characterize the common distribution of these 
coalescence times in some, biologically meaningful, special cases for which the likelihood 
reduces to an elegant analytical formula or becomes numerically tractable. 



Running head. The reconstructed tree in the protracted speciation model. 
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1 Introduction 

A central question in evolutionary biology is to infer the nature of processes which have shaped 
the contemporaneous patterns of biodiversity. A popular approach is to use time-calibrated 
phylogenies of extant species (starting with Nee et al. 1994) which have been independently 
built, e.g., from interspecific gene sequence information. The aim is to choose, among a class of 
models of speciation and extinction, the ones that are the most likely to have generated a given 
phylogeny, by maximum likelihood or Bayesian methods. One of the key steps in this process 
is to evaluate the likelihood of a phylogeny under a given model of diversification, for exam- 
ple where species are viewed as particles that can reproduce (speciation) or die (extinction) 
independently at random times. For such so-called lineage-based models of diversification, it is 
elementary to compute the likelihood of the whole species tree, but it is a more complicated task 
to compute the likelihood of the species tree spanned by extant species, also called the recon- 
structed tree. Reconstructed trees are ultrametric trees, in the sense that all tips are at the same 
distance to the root. The probability distribution of the reconstructed tree is well-known for the 
linear birth-death process of diversification, where lineages are assumed to reproduce and die 
independently, at exponential rates possibly varying in time (see Nee et al. 1994, following the 
seminal work of Kendall 1948). This distribution is also known in the case of binomial sampling 
when only a fraction of extant lineages is sampled, independently with a certain fixed probabil- 
ity |Morlon et ah, 20TT Stadler, 2011 Hallinan, 2012| . A specific feature of the reconstructed 



tree generated by a linear birth-death process with binomial sampling is that its topology is 
uniform over topologies with ranked node splitting times, and that node splitting times, or 
node depths, are independent and identically distributed (fid). Ultrametric trees satisfying 
this property are called coalescent point processes |Popovic, 2004[ Aldous and Popovic, 2005] , 



It is proved in Lambert (2010) and Lambert & Stadler (2013) that this result is robust to the 
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Markov assumption, that is, it holds even if species lifetimes are not exponentially distributed, 
or otherwise put, when extinction rates may depend on the species age. 



Etienne & Rosindell (2012) have proposed a lineage-based model of diversification, called 
protracted speciation model, where newborn species are so-called incipient species and become 
so-called good species after some exponentially distributed time. This model is a lineage- 
based version of the individual-based protracted speciation model of Rosindell et al. (2010), 
and can explain the slowdown in lineage accumulation observed in real phylogenies, a phe- 
nomenon that could indeed be due to the fact that populations experiencing recent speciation 
are not detected as actual species. Alternative explanations include the dependence of spe- 



ciation or extinction rates upon the overall number of species |Rabosky and Lovette, 2 008 



Etienne et al., 2012, Etienne and Hacgeman, 2012 , ecological speciation |McPeek, 200 8 1, and 



geographic speciation Pigot et al., 2010 



Here, we consider a generalization of this model, where the times spent in the incipient 
stage (or in several incipient stages) and in the good stage can be correlated and have in- 
homogeneous and general distributions, that is, when the rates at which species can change 
type or become extinct may depend on time or on their age (and type). The interpretation 
of protracted speciation is that newly founded populations (i.e., incipient species) cannot be 
discriminated from their mother population before enough time has elapsed to complete ge- 
netic differentiation and/or reproductive isolation. In this view, all extant incipient species 
descending, by a chain of incipient species, directly from the same good species, are considered 
as a cloud of satellite populations belonging to the same species. This cloud must only have 
one representative species in the phylogenetic tree. If the ancestor good species a of the cloud 
is extant, then a is the natural representative species of the cloud. Otherwise, we set up a 
natural rule to define which of the extant descending incipient species of a is the representative 
species. Roughly speaking, the one representative species of an extinct good species a is cho- 
sen as the last incipient species among species descending from a by a chain of incipient species. 



We study the reconstructed tree spanned by all extant species, by representative species 
and by good extant species (by decreasing order of inclusion). We prove that if the speciation 
rate does not depend on species status, then all three reconstructed trees are given by a coa- 
lescent point process, and we provide numerical methods to compute the common distribution 
of node depths in each case. We also provide a closed formula in the case of the reconstructed 
tree spanned by all extant species, as well as by good extant species, in the original setting of 
Etienne & Rosindell (2012) when rates are age-independent and do not vary with time. 

Hereafter, we will make a difference between the terms phylogenetic tree and reconstructed 
tree. The phylogenetic tree at time T is the tree with edge lengths obtained after throwing 
away all points at distance larger than T from the root (the future of T). The reconstructed 
tree is the tree obtained from the phylogenetic tree after removing all lineages that are extinct 
by time T. 

In the next section, we specify the model assumptions, extending the constant rate model 
of Etienne &: Rosindell (2012) in two directions: the homogeneous model, where rates can be 
stage-dependent, and the Markov model, where rates can be time-dependent. We also define 
the so-called ultimogeniture order on the (finite) set of species of the phylogenetic tree, and 
use this order to define the rule for the choice of representative species. In Section 3, we 
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propose a first numerical method to compute the likelihood of the reconstructed trees, which is 
a natural follow-up to Etienne & Rosindell (2012), involving an infinite set of coupled ordinary 
differential equations. In Section 4, we give the rigorous definition of a coalescent point process, 
we introduce a total order on the phylogenetic tree embedded in continuous time, and use this 
to prove that the reconstructed tree of all extant species is a coalescent point process. In 
Section 5, we propose two methods, one for the homogeneous model, and one for the Markov 
model, to compute the coalescent distribution for the reconstructed tree of all extant species. 
Each of these two methods can be applied to the constant rate model, resulting in the same 
closed formula. In Section 6, we adapt these two methods to the cases of good species and of 
representative species, both of which are much more efficient and accurate than the one given 
in Section 3. In Section 7, we discuss two extensions to our model: including several stages of 
incipientness, and assuming that only a fraction of extant species is sampled. 

2 Model and preliminaries 

2.1 The protracted speciation model 

Following Etienne & Rosindell (2012), we model the dynamics of a phylogeny by a time- 
continuous, (possibly) time-inhomogeneous, (possibly) non-Markovian, two-type process, where 
a birth event is interpreted as the arrival of an incipient species and a death event is interpreted 
as an extinction. We will always assume that species behave independently, that is, that there 
is no diversity-dependence (branching property). We assume that each species gives birth in 
a Poissonian manner, that is, with an instantaneous speciation rate b which is a constant or 
nonconstant function of time. 

Species can be of type 1 or 2, where 1 is the 'incipient' stage and 2 is the 'good' stage. 
Note that the case of several stages will be studied in the last section. The speciation rate is 
assumed to remain constant regardless of stages. At speciation time, say s, the new species 
starts out in state 1. It remains in state 1 for a random duration U s , which is the duration of 
the incipient stage. At time t = s + U s , it can become extinct or change type, that is, turn into 
a good species, with a probability that may depend on s and U s . If it succeeds to turn into a 
good species, it then survives another random duration V s , which is the duration of the good 
stage, after which it becomes extinct. In what follows, the random variables U s and V s may be 
correlated. We will first study the case when their distribution does not depend on s, a case 
referred to as the homogeneous model, because then the dynamics of the diversification process 
is time- homogeneous. We will then focus on the case when the distribution of stages is given by 
instantaneous hazard rates, in which case the diversification process counting the numbers of 
species of all types is Markovian. This case, referred to as the Markov model, divides into the 
inhomogeneous Markov case, where rates can be time-variable, and the homogeneous Markov 
case, which is the model studied most by Etienne &: Rosindell (2012). We will term this latter 
case the constant rate model. Note that the constant rate model is the intersection between 
the homogeneous model (where rates are time-independent but may be age-dependent) and 
the Markov model (where rates are age-independent but may be time-dependent). 

In the Markov model, an incipient species can become extinct at rate ji\ and can turn into 
a good species at rate Ai; a good species becomes extinct at rate ^2- Note that these rates may 
be nonconstant functions of time (inhomogeneous Markov case). Regardless of species type, 
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the speciation rate is b (an assumption written Ai = A3 in Etienne & Rosindell (2012), where 
numbers indexing species status were swapped). We will always make this assumption, but we 
stress that we do not make further assumptions on the other parameters (for example we can 
very well have n\ / ^2). We stress that the constant rate model studied in Etienne & Rosin- 
dell (2012) indeed fits into our general framework assuming that U is exponentially distributed 
with parameter v\ := X\ + [ii, that V is independent of U, that V equals with probability 
Mi/(^i + an d otherwise follows the exponential distribution with parameter ^2. Note that 
then E(U) = v^ 1 and E(V) = (Ai/Vi)^ 1 , so that the diversification process is supercritical 
(exponentially growing number of species with positive probability) iff bE(U + V") > 1, that is, 
6(/x 2 + A 2 ) - v\H2 > 0. 

In the general case, the event {V = 0} is the event that the species becomes extinct before 
turning good. If U is set to 0, the process counting the number of species is a one-type Crump- 
Mode— Jagers process, as studied in Lambert (2009, 2010) and Lambert & Stadler (2013). In 
particular, if U is set to and V is exponentially distributed, then the process is a classical linear 
birth-death process, as studied in Nee et al. (1994), which is (homogeneous and) Markovian. 
It is known that in the aforementioned simple cases, the likelihood of the reconstructed tree can 
be put in product form, meaning that the coalescence times, or node depths, are independent. 
Actually, they are also equally distributed, so the reconstructed tree is a so-called coalescent 
point process (see below). We will show that the reconstructed tree (spanned by all extant 
species, or by all good extant species, or by all representative species) is again a coalescent 
point process, even if U and V are both truly random and possibly correlated, and even if their 
joint distribution is time-dependent. 

2.2 The ultimogeniture order and the definition of representative species 

From now on, we consider a protracted diversification process starting with one (incipient) 
progenitor species at time and conditioned to have extant species at time T. We wish to 
endow its set of species, both extant and extinct, with a total order, regardless of types. 

Recall that in our setting, at each speciation event, we discriminate between the mother 
species and the daughter species (a distinction that can be randomly defined in the Markov 
model, with equal probabilities for each of the two configurations). We can now define an order 
on the set of species, called ultimogeniture order. In what follows, we will say that species a is 
younger than species b if a was born later than b, in forward time. 

Definition 2.1 We define the ultimogeniture order, denoted by the order relation -<, as fol- 
lows. Let a and b be two species with most recent common ancestor species c. If a = c, we set 
a ~< b, and if b = c, we set b ~< a. Otherwise, we let a' and b' be the daughters of c which are 
ancestors of a and b respectively. Then a -< b if a' is younger than b' , and b~<aifb'is younger 
than a! . We will most of the times say that a is smaller than b instead of writing a -< b. 

Another way of defining this order is to recursively label each species by a finite word of 
integers as follows. The progenitor species is labeled 0. Then, if u is the label of a species born 
before T, then the youngest daughter of u born before T is labeled ul, its second youngest 
daughter born before T is labeled u2, and so on. Then the ultimogeniture order is the lexico- 
graphical order associated with this labeling. 
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It is not difficult to see that this order defines a total order on any finite set of species, 
and in particular on the set of species, extant or extinct, of the tree stopped at time T. See 
Figure [T] for an example of a phylogenetic tree with 7 species extant at time T labeled in the 
ultimogeniture order. 
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Figure 1: a) A phylogenetic tree with two stages, starting from one ancestor species born 
incipient at time 0, with 8 extant species at time T labeled in the ultimogeniture order. Dotted 
lines indicate speciation events. Vertical edges start in dashed line, indicating the incipient 
stage, which sometimes turn into a solid line, indicating the good stage. The four species 
2, 3, 6 and 7 are still incipient at time T, and species 2 (resp. species 7) is not representative, 
because the first extant descendant of its most recent good ancestor species is species 1 (resp. 
species 6); b) The reconstructed tree of all species extant at T; c) The reconstructed tree of 
representative species extant at T; d) The reconstructed tree of good species extant at T. 

For any species a with extant descendance, one can define the smallest extant descendant 
of a, or first extant descendant of a as the smallest species in the set of its extant descendant 
species, where 'small' and 'first' are to be understood in the sense of the ultimogeniture order. 
In other words, the first extant descendant of a is the unique extant species b descending from 
a such that b -< c for any other extant species c descending from a. Note that the first extant 
descendant of a can also be defined recursively as the first extant descendant of its youngest 
daughter with extant descendance. 

We now wish to define representative species. As in the infinite alleles model, assume that 
each new good species is given a new type, called allele to avoid ambiguity with the stages, and 
assume this allele is inherited by all its daughter incipient species. As said in the introduction, 
all species with the same allele are seen as satellite populations of the same species that cannot 
be discriminated from each other, so that a phylogenetic tree cannot comprise more than one 
representative of all species sharing the same allele. We want to set up a rule to designate 
the representative species of each allele at time T. Then we will be able to consider the 
reconstructed tree of representative species, as the tree subtended by all species extant at T 
that are representative of some good ancestor species. First, if the good ancestor species is 
extant at T, then it is naturally chosen as the representative species of its allele. If the good 
ancestor species is extinct by T, we should ideally designate the representative species as the 
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smallest of the extant descendants of the good (extinct) ancestor species that share the same 
allele. With such a definition, any extinct good species with extant descending species sharing 
the same allele would be represented. However, it will be mathematically more convenient to 
set up the following alternative rule, which is biologically less satisfying, because an extinct 
good species might have extant descending species sharing the same allele but no representative 
species (see Figure [2] for an example) . 



o _i_ 



Figure 2: A phylogenetic tree with 4 extant species at time T. Species 1 is good at time 
T but all other species are still incipient at T. This figure illustrates the fact that a species 
with extant descendants carrying the same allele can have no representative species. Species 
a has no representative species at T because its first extant descendant (species 1) is a good 
species, and so does not carry the same allele. However, species 2 and 3 are extant species 
both carrying the same allele as species 1. Species b is represented by species 4. 



Definition 2.2 Any good species extant at time T is its own representative species. For any 
good species extinct at time T, if its first extant descendant shares the same allele, then it is 
designated as its representative species, otherwise no representative species is designated. 

A consequence of this definition is that any extant incipient species which is the first extant 
descendant of some extinct good species is a representative species. See Figure [lb for the 
reconstructed tree of representative species extant at T. 

3 An infinite set of coupled ODEs 

Let P(nx,ri2',t) denote the probability that one incipient species born at time has n\ de- 
scendant incipient species and ri2 descendant good species at time t. Also let P(-,n2]t) = 
J2 n >o P( n ii n i'i denote the probability of n-i descendant good species and P(n 1; -;t) = 
^n 2 >0 P{ n li n 2! t) denote the probability of n\ descendant incipient species. 

Etienne & Rosindell (2012) provided an expression of the likelihood of a phylogeny with stem 
or crown age T for the Markov version (with constant rates) of the protracted speciation model. 
This expression is a product of a multiplicative term involving P(-, 0; T) and of evaluations of 
the function / at node depths of the phylogeny (see next section), where f(t) = P(- ; 1; T — t). 

The functions P(-, 0; t) and P(l, ■; t) can in principle be obtained by integrating the (infinite) 
system set of Kolmogorov differential equations satisfied by the functions (P(ni,n2]t)) nitn2 . 
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When there is no extinction (/zi = \i<i = 0), P(n,0;t) and P(n, l;t) can be computed analyt- 
ically by solving the corresponding partial differential equation for the probability generating 
function (Etienne & Rosindell 2012). 

When extinction is non-zero, however, this trick no longer works because the partial dif- 
ferential equations are not analytically tractable. A different trick, used by Kendall (1948) 
and Nee et al (1994) is however possible, under the assumption fj,i = ji2 ='■ f-t- One can then 
view the reconstructed birth-death process as a birth process with time-dependent speciation 
initiation rate bar{t) at time T — t where ar(i) = (b—fj,)/ (b — f/ 1 e~( b ~^( T ~ t ') is the probability 
of survival of the birth-death process with birth rate b and death rate [/,, in T — t time units. 
Thus, we get 

ba T (n - l)P(n - 1,0; t) - {{ba T + \i)n)P(n, 0; t) (la) 

ba T nP{n - 1, 1; i) + A x (n + l)P(n + 1,0; t) (lb) 
(ba T (n + 1) + Xin)P(n, 1; i) 

with initial conditions P(n, 0;0) = 1 if n = 1 and otherwise, and P(n, 1;0) = for all n. 
This procedure has three disadvantages. First, in practice only an approximation can be used 
by truncating the infinite set of ODEs at some arbitrary values of ri\ and n^. Second, the 
set of ODEs is large even for moderate upper limits of n\ and n<i, and hence computationally 
demanding. Third, the procedure is only valid under the assumption that fii = fj,2- In this 
paper, we develop an approach which avoids these three disadvantages. 

4 Coalescent point processes and the reconstructed tree of all 
extant species 

4.1 Coalescent point processes 
4.1.1 Definitions and main properties 

Definition 4.1 A coalescent point process is a random, planar, utrametric tree with edge 
lengths, where tips are numbered 0, 1,2, . . . from left to right, started with a single root point, 
and which satisfies the following two properties, monotonic labeling and independence, to be 
defined below. 

We call T the stem age of this tree, that is, the common graph distance of tips to the unique 
root point. 

1. Monotonic labeling. If Ca+k denotes the coalescence time, (or divergence time) between 
tip i and tip i + k, that is, the time elapsed since their lineages have diverged, then 

Ci,i+k = max{jY i+ i, . . . , H i+k }, (2) 

where Hi := Ci-\^. In particular, the genealogical structure is entirely given by the knowledge 
of the sequence Hi, H2, ■ ■ ■ that we will call either coalescence times or node depths. See Figure 
[3] for a tree satisfying this property. 
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Figure 3: Illustration of a coalescent point process showing the node depths H±, . . . , Hq for 
each of the 6 consecutive pairs of tips. The node depth Hj is the first one which is larger than 
T. 

2. Independence. There is a random variable H (whose probability distribution may de- 
pend on T) such that node depths form a sequence of independent, identically distributed 
random variables, all distributed as H, killed at its first value larger than T. 

Otherwise said, the number Nt of tips in the coalescent point process follows the geometric 
distribution with success parameter P{H > T), and, conditional on Nt = n, the node depths 
Hi,... ,H n are independent copies of H conditioned on H < T. We will call the coalescent 
distribution associated with a coalescent point process the law of H. It will often be convenient 
to use the inverse W of the tail of the coalescent distribution as a way of characterizing it 

We will always assume that H has a density (wrt Lebesgue measure) , so that W is differentiable 
and the density of H, say /, is given by 

4.1.2 Likelihood formulae 

If a reconstructed tree has the law of a coalescent point process with coalescent density / (given 
by / = W'/W 2 , W denoting the inverse of the tail of the coalescent distribution), then the 
likelihood C (conditional on at least 1 extant species) of a reconstructed tree r with stem age 
T, n extant species and node depths x\ < • • • < x n _i is given by 
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where C(r) is some combinatorial constant (see Tajima 1983). If C(t) is the mere likelihood of 
ranked node depths x\ < • • • < x n _i, then C(r) = (n — 1)!, but if C(r) is the joint likelihood of 
the topology, or shape, of r, again with ranked node depths x\ < ■ ■ ■ < x n _i, then C(r) = 2^ T \ 
where i(r) is the number of nodes of r that do not subtend cherries. 

Note that if T is the crown age of r, that is, if the two longest edges of r both have length 
T, then the likelihood C c (r) (the subscript 'c' stands for 'crown age') of the reconstructed tree 
r with crown age T, n extant species and node depths x\ < ■ ■ ■ < £ n -2 (now there are only 
n — 2 node depths strictly smaller than T), conditional on speciation at time and survival of 
the two incident subtrees, is the product, properly renormalized, of the likelihoods of the two 
reconstructed subtrees conditional on survival, which equals 

i=l 

where C(r) was definde previously. This formula can be seen as obtained from the previous 
one by replacing one of the evaluations of / by W(T)~ l = P(H > T). 

Note that if the tree with stem (resp. crown) age T is conditioned to have exactly n tips, 
then the conditioned likelihoods C n (resp. £") become 

C n {r) = C{r)J[f T {x t ) and resp. C n c {r) = J] f T {x t ), (5) 

i=l i=l 

where fr(x)dx = P(H G dx | H < T), that is, f T {x) = f (x)W (T) / (W (T) - 1). Indeed, for 
the crown age, the probability to have n tips conditional on two ancestors each having alive 
descendance at T equals (n - 1)P(H < T) n ~ 2 P{H > T) 2 . 

Finally, we stress that all these likelihood formulae can be generalized to situations when not 
all extant species of the same clade are included in the tree. Indeed, most available phylogenies 
are not complete, in the sense that not all extant species descending from the same ancestor 
species are sampled and included in the phylogeny. In the next paragraph, we show how to 
compute the likelihood of the reconstructed tree of phylogenies which have missing extant 
species. 



4.1.3 Likelihood formulae with missing species 



There are two main ways considered in the literature of randomly removing tips from a phyloge- 



netictree: the binomial model Stadler, 2009 , Lambert, 2009 , Stadler, 2011||Morlon et al, 2010 




Morion et al., 2011 


Hallinan, 2012] and the n-sampling model Stadler, 2009, Etienne et al., 2012 



Note that we can choose to first reconstruct the phylogenetic tree (i.e., throw away extinct 
lineages) and then remove tips from the reconstructed tree or first remove tips from the phylo- 
genetic tree and then reconstruct the sampled tree, because both operations commute. In the 
n-sampling scheme, given a phylogenetic tree (or a reconstructed tree) with more than n tips, 
n tips are selected uniformly (e.g., sequentially) and all other tips are removed. In the binomial 
sampling scheme, or />sampling scheme, given the phylogenetic tree (or the reconstructed tree), 
each tip is removed independently with probability 1 — p, where p is the so-called sampling 
probability. In Subsection I7.2[ we will also consider an extension of this sampling scheme where 
the sampling probability depends on the stage (incipient or good) of the tip species. 
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The n-sampling scheme. The tree obtained after ra-sampling a coalescent point process is 
not a coalescent point process any longer. Assume we start from a coalescent point process with 
height T, with coalescent distribution given by some random variable H, and with a random 
number of tips N > n. Select uniformly n tips among N (selecting uniformly one tip among N, 
then selecting uniformly a second tip among the remaining N — 1, and so on n times). Relabel 
the n sampled tips 1,2, ... ,n ranked in the same order as they were in the initial coalescent 
point process and set H[ the coalescence time between sampled tip i and sampled tip % + 1, 
i = 1, . . . , n — 1. By summing over all possible configurations of sampled tips, it is easy to see 
that for any m > and any x\, . . . , x n —i G [0, T] 

P(N = n + m,H[<x 1 ,...,H' n _ 1 <x n ^ 1 ) = - - \ P(H > T) x 

(n + my. 

x V] P(H < Xl ) mi+1 ---P(H <x n ^) m "- 1+1 P{H <T) mo+mn , (6) 

m:moH hm n =^ 

where the sum is taken over all possible vectors of non-negative integers rh = (mo, . . . ,m n ) 
such that mo + • • • + m n = m. 

It is easy to differentiate ([6]) to get 



P(N = n + m, H[ 6 dx\, . . . , H' n _ 1 G dx n -\)/dxi ■ ■ ■ dx n - 



i 



= f n \ m \, P ( H >T ) E < T) mo+m " TT( m i + l )f( x i) P (H < x t ) m >. 

(n + mj! z — ' J - J - 

rh:m,Q-\ \-m n =m i=l 

If we sum directly over all pairs (mg, m n ), and if we write x n = T, we get 
P(N = n + m, H[ £ dx\, . . . , H' n _ 1 £ dx n -i) / dx\ ■ ■ ■ dx n -\ 

I , n-1 

= . n ' m -. > T) V (m n + < x n ) m " TT(^ + < ^) mi - 

(n + mj! ^ — ' J - J - 

m:miH \-m n =m 1=1 

In other words, the likelihood C s (t) of a reconstructed tree r with stem age T, n sampled 
species, m missing species (i.e., n + m extant species) and node depths x\ < ■ ■ ■ < x n —i, is 
given by (writing again x n = T) 

£ s (t) = £(t) J{{m + l)P{H<x % r (7) 

in + m)\ ±J - 

m:m\~\ \-m n =mi=l 

where C(t) is given by ([3]). A similar line of reasoning shows that the same correction factor 
holds for a reconstructed tree r with crown age T, n sampled species, m missing species and 
node depths »!<•••< x n _2, if now we write x n -\ = x n = T. 



The binomial sampling scheme. The /j-sampling scheme is trivial to handle in our sit- 
uation. Indeed, as is explained in Lambert (2009) and Lambert & Stadler (2013), the tree 
obtained after binomially sampling a coalescent point process with coalescent inverse tail dis- 
tribution W is, conditional on survival, a new coalescent point process with coalescent inverse 
tail distribution W p given by 

W p = 1 - p + P W. 
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In Subsection 17.21 we will extend these computations to cases when sampling probability de- 
pends on the stage of the species. 

4.2 Another total order 

We wish to endow the phylogenetic tree associated with the diversification process with a total 
order which should be consistent with the total order on the set of species defined in Section 2. 
We stress that we think of the phylogenetic tree as a continuous object embedded in continuous 
time, whose elements are all timepoints belonging to edges of the tree, so that this order can 
be seen as a time-continuous process visiting all timepoints in the phylogenetic tree, which we 
call exploration process (Lambert 2010). 

Recall that the phylogenetic tree at time T is truncated at time T, in the sense that all the 
points at distance greater than T from the root point are removed. The exploration process 
starts at the tip of the ancestor species' edge (at distance from the root equal to the extinction 
time of the ancestor species, or T, if the ancestor species is still extant at time T) and explores 
anterior points in this edge, running towards the root at unit speed, until it reaches the birth 
node of the youngest daughter species of the ancestor species born before T; at this time it 
jumps to the edge tip of this daughter species (again, possibly truncated); when the exploration 
of an edge terminates (it always terminates at the birth node of this edge), it is immediately 
followed by the exploration of the mother edge at that node. The exploration is recursively 
defined in this way. This exploration process induces a total order on points of the phylogenetic 
tree, where the smallest element is the tip point of the ancestor edge and the largest element 
is its base point. In particular, the ultimogeniture order defined in Section 2 is also the order 
obtained when ranking species in the order where they appear in the exploration process. 

The contour process X of the phylogenetic tree is a process living in [0, T] and indexed by 
the same meaningless time variable as the exploration process. At any time s, X s is defined 
as the distance to the root of the point visited at time s by the exploration process. Then the 
contour process has positive jumps (the lifetimes of species) and derivative —1 everywhere but 
at jump times (see Figured]). As can be seen in the figure, the contour process can be inter- 
preted as the height of a ball that slips down the right-hand side of edges of the phylogenetic 
tree (embedded in the plane) at unit speed, and bounces back up to the next edge tip on its 
right each time it encounters a dashed line. 

As was shown in Lambert (2010), this contour process X is a Markov process which jumps 
at rate b and makes jumps that are distributed as a species lifetime (that is, as U + V), which 
is truncated to T when a jump overshoots T, and is killed when it hits 0. The number of 
visits of T by this process is exactly the number of extant species at time T. Each time the 
contour process visits T, it makes a new excursion below T which can either terminate by 
hitting (end of the exploration) or by hitting T (visit of a new extant species). Now recall 
that the excursions of a Markov process away from a given point (here, T) are independent and 
identically distributed (hd). Also observe by a quick inspection of Figure [4] that the coalescence 
time between two consecutive species visited by the contour process is exactly the depth of the 
excursion below T starting at the first of these two visits and ending at the second one. This 
shows that, regardless of types, the reconstructed tree of all extant species at T has iid node 
depths, all distributed as the depth of an excursion of X below T. In particular, this ultrametric 
tree is a coalescent point process, whose coalescent distribution is the law of the depth of an 
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Figure 4: Top panel: A tree with edges in bold and speciation events shown by horizontal 
dashed lines (all horizontal edges have zero length); the species born before T are labeled in 
the ultimogeniture order, and three zones of the tree are labeled by letters a, b and c. Bottom 
panel: The contour process associated with the same tree after truncation at time T; edge 
labels are reported on top of each corresponding jump; epochs of visits of zones a, b and c by 
the contour process are indicated. 

excursion below T. We record this in the following statement. 

Proposition 4.2 Under the protracted speciation model, conditional on at least one extant 
species at time T , the reconstructed tree spanned by all species extant at T regardless of their 
types is a coalescent point process. The associated coalescent distribution is the law of the depth 
of an excursion away from T, made by the stochastic process X which jumps at rate b(s) when 
at s, with jump size distributed as U s + V s , and has slope —1 everywhere else. 

Now with the last proposition in mind, equations ([3]) and ([5]) yield an expression for the 
likelihood of reconstructed trees of all extant species under the protracted speciation model, 
provided we can compute the associated coalescent distribution. The goal of the next section 
is to perform this computation. 
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5 Computation of the coalescent distribution for the tree spanned 
by all extant species 



In this section, we treat in detail the case of the tree spanned by all extant species. We will 
build on these developments to give a more straightforward treatment of the trees, spanned by 
good species and by representative species, in the subsequent section. 

From now on, we denote by H the random variable associated with the coalescent point 
process of all extant species (see Proposition I4.2|) . and we set 

the inverse of the tail of the coalescent distribution. We now show how to compute this function 
in the homogeneous model first, and then in the Markov model. We will then show how to use 
either of these methods to treat the constant rate model. 



5.1 The homogeneous model: a Laplace transform 

In the homogeneous model, neither b nor the law of (U, V) depend on time. Then W can be 
computed from the knowledge of the speciation rate b and from the law of the total species 
lifetime A := U + V. 

Let ip be the so-called Laplace exponent of the process X in the homogeneous model. The 
function ip is a convex function on [0, oo) that characterizes the law of X and therefore only 
depends on the law of the total species lifetime A := U + V and of the speciation rate b. More 
specifically, 

/>oo 

iP(s) = s-b + bE(e- sA ) = s-b + b e~ sx P(A G dx) s > 0. 

Jo 

Then it is known (Bertoin 1996) that W is the unique non-negative function g on [0, oo) 
satisfying 

e- s *g{x)dx = -^- (8) 
ip(s) 

for all s greater than the exponential growth rate of the tree. The coalescent distribution can 
therefore be computed by inverting the previous Laplace transform. Indeed, recall from Section 
[4]that the density of the coalescent distribution, say /, is then given by f(y) = W'(y)/W{y) 2 . 



L 



5.2 The Markov model: extinction probabilities 

Let us turn to the Markov model. Recall that in the Markov model, an incipient species can 
become extinct at rate /ii and can turn into a good species at rate Ai; a good species becomes 
extinct at rate ^2] the speciation rate is b, regardless of species type; all these rates may be 
nonconstant functions of time. 



Convention. From now on, rates are expressed backwards from the stem age T, in the sense 
that b(t) stands for the speciation rate at absolute time T — t, and similarly for other rates. In 
particular, 6(0) is the speciation rate at present time. 
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The important idea behind the contour analysis is that for any integer n smaller than the 
total number of tips: 1) all points of the tree visited after the visit of the (n — l)-th tip belong 
to subtrees that are independent of the past of the exploration process (the part of the tree 
visited before this visit), and 2) that those subtrees branch off the lineage joining the root 
to the (n — l)-th tip, in a Poissonian manner, with inhomogeneous intensity b, that is also 
independent from the past of the exploration process. 

Now the coalescence time between species n — 1 and species n is greater than y if and only 
if all subtrees that have branched off this lineage at absolute times belonging in (T — y,T), 
do not have descending species by time T. If qi(t) denotes the probability that a species in 
the incipient stage at absolute time T — t has no extant descending species at absolute time 
T (extinction probability), then dtb(t)(l — qi(t)) is the probability that there is a subtree 
sprouting in the interval (T — t,T — t + dt) and surviving up to T, so that the zero-th term 
of the Poisson distribution of subtrees sprouting between T — y and T and surviving up to T 
equals 

P{H>y) = exp(- jT dtb(t)(l- qi (t))\ y>0. (9) 

Then the problem moves to characterizing the function q\ . We do not have a closed formula for 
this function, but we know that the pair (51,(72) satisfies a system of Kolmogorov differential 
equations, where 52 (t) is the probability that a species in the good stage at absolute time T — t 
has no extant descending species at absolute time T. This 2D differential equation is given by 



qi = ~{v\ + b)q\ + Ai<? 2 + Mi + bq\ 



(10) 



with initial conditions qi(0) = and 92(0) = 0. Recall that v\ = Ai + fix, and that all rates b, 
X%, fj,i, fj,2 may depend on time. 

Setting g{y) := P(H > y) and recalling that / is the density of H, we get / = —g, and by 

©, 

g = -b(l- qi )g, (11) 

so that 

f(y) =b(y)(l- qi (y)) exp (- J" dtb(t) (1-qi (t))) y > 0. (12) 

If one numerically solves (fTUj) , then one can plug q\ into (fT2j) to get / and hence the likelihood 
of any reconstructed tree, which is proportional to the product of evaluations of / at node 
depths (see e.g. equation ([3])). To get /, one can equivalently integrate (fTTj) (initial condition 
<?(0) = 1) simultaneously with (jlOp . and then use / = b{\ — q\)g to avoid computing the integral 
in (TT2|). 

If the dependence on t by the instantaneous rates is piecewise constant as in Stadler (2011), 
then this numerical method should be particularly stable. 



Remark 1 Note that the second equation in (jlOp can be integrated as 

Q2(t) = j fi 2 (y) dy exp ^- J ds ^ 2 (s)J exp ^- J ds b(s) (1 - qi(s)) 
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5.3 The constant rate model: computation of the coalescent distribution 
and of extinction probabilities 



Recall that the constant rate model can either be seen as a particular case of the homogeneous 
model, by specifying the probability distribution of (U, V) as the one described in Section 2 (U 
is exponentially distributed with parameter v\ := Ai + \t\, V is independent of U, and either 
V equals 0, with probability Hi/{Xi + fJ>i), or it is exponentially distributed with parameter 
H2), or as a particular case of the Markov model, by assuming that rates are constant through 
time. 

Seeing the constant rate model as a particular homogeneous model, we can invert the 
Laplace transform in ([8]) after specifying the distribution of (U,V). Alternatively, seeing this 
model as a particular Markov model, we can compute the solution to (|10p with time-constant 
rates and plug the solution into ([9]). 



Let Q be the following polynomial of degree 2 

Q(s) = s 2 + ([12 + v\ — b)s + - b/j,2 - b\i. 
It is easy to see that Q always has two distinct real roots a < (3 given by 

- (b — vi—H2 — VK^) and /3 = i (b — vx— ^2 + 



a 



where K := (b + fj, 2 — v\) 2 + 46Ai. It is also easy to see that a is always negative, so that 
f3 > if and only if 1^2(^1 — b) — b\\ = a/3 < 0, that is, in the supercritical case. Actually, 
it can be shown that in this case, (3 is the Malthusian parameter of the process counting the 
overall number of species (incipient or good). In other words, conditional on nonextinction, the 
overall number of species grows exponentially with exponent (3. Indeed, forthcoming equation 
(I16p shows that (3 is the (only) positive root of tjj, which is shown in Lambert (2010) to be the 
Malthusian parameter of the corresponding branching process. 

We now give a closed formula for W, which is the inverse of the tail of the coalescent 
distribution for the reconstructed tree of all extant species. 

Proposition 5.1 When /^(^l — b) — b\\ / 0, we have (3^0 and then 

W{y) = a + ai e ay + a 2 e^, (13) 

where 

_ _ {a + Vi)(a + fj, 2 ) _ b(a + Ai + /z 2 ) (P + + M2) _ Hf3 + Ai + ^2) 

af3 ' a(a — (3) a(a — (3) ' f3(j3 — a) (3(f3 — a) 

When (J, 2 (vi — b) — b\i = 0, we have (3 = 0, a = b — v\ — jj, 2 , and 

W{y) = b + b l e ay + b 2 ye ay , (14) 

where 

h - l _ -1 h l _ b(b-[ii) 
t>o — — 5-, Ox — i. — Oo, 02 — . 
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We show this proposition by two different methods. Let us first use the results of the homoge- 
neous model and proceed by Laplace transform inversion. With our distributions of U and V, 
we get 

E (e~< u + v A = ^ l(s + ^ ) + Al/ f , (15) 

so that 



(s + vi)(s + fl 2 ) 
ip(s) = - -7 -. (16) 



Elementary calculus then yields 



oo 



o 



V>(s) sQ(s) s s — a s — f3 



Depending on the signs of ao, ai and a 2 , we can invert this Laplace transform to get the 
announced result that W(y) = ao + a\e ay + a 2 eP y ■ The method is to substract all terms 
corresponding to negative coefficients among ao,ai,a2, to equate the Laplace transforms of 
two positive functions and conclude by the injectivity argument. For example, if a 2 < 
whereas ao,ai > 0, then y h- > W(y) — a 2 e l3y is a positive function whose Laplace transform 
equals s h- >• ^ + which is the Laplace transform of the positive function y \— > ao + aie Qy , 
hence the equality W(y) — a 2 e^ y = ao + a±e ay . 

We can do the same kind of calculations as previously in the case when (3 = 0. 

Let us now show how to apply the method developed for the Markov model. Note that 
here, because of time homogeneity, qi(t) (resp. q 2 (t)) is the extinction probability in t time 
units starting from one incipient species (resp. from one good species), regardless of the value 
of starting time. Recall from Q that 

W(y) = e W (b [ V dt(l- gi (t)) 



\ Jo 
so that 

w 

« l = 1 -w (17 > 

an expression also displayed in Lemma 3.1 in Lambert (2011). Plugging this into the first line 
in ([TO]) , we get 

Now if we plug the last two equalities into the second line of ()10p . we get 

W" + (jm + vx- b)W" + {vim - bfi 2 - bX^W = 0. 

Then W' is the solution to a second-order, linear differential equation, whose characteristic 
polynomial is Q. As a consequence, W indeed is a linear combination of exponentials with 
exponents a and /3, so that W is a linear combination of exponentials with exponents 0, a and 
f3. We omit the detailed computation of the coefficients of this linear combination. 

As a side result, we get the following expression for the extinction probabilities using (fT7]) 
and CED. 
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Corollary 5.2 In the constant rate model, as soon as [i 2 {y\ — b) — b\\ ^ 0, the extinction 
probabilities qi(t) (resp. q2(t)) in t time units starting from one incipient species (resp. from 
one good species) are given by 

_ ba + ai(b- a)e at + a 2 (b - /3)e^ 

9lW ~ b(a + a ie at + a 2 e^) ' 

and 

n\ — ^i°o + oiM2(a + v\ — b)e at + a 2 fi 2 ((3 + — b)e^ 1 
q2[ ) ~ 6Ai (a + a ie at + aae/ 3 *) ' 

In i/ie critical case, that is, when [i 2 {y\ — b) — bX\ = 0, recall that j3 = 0, a = b — v\ — fi 2 < 0, 
and we ge£ 

, , _ bb - (bb + a)e at - bzjui + ^)te at 
Ql[) ~ b(b + b ie at + b 2 te at ) 

and 

_ frAifep - (6Ai6 + ab + ^)e Qf - 6 2 ^te af 
~ 6Ai (6o + he at + b 2 te at ) 

Remark 2 In the critical (f3 = 0) and subcritical ((3 < 0) cases, we see that the extinction 
probabilities increase exponentially fast to 1 as t — >• oo. In the supercritical case, /3 is positive 
and the extinction probabilities converge as t — >■ oo to the overall extinction probabilities respec- 
tively equal to 1 — (f3/b) (when starting from one incipient species) and to fJ> 2 ((3 + V\ — b)/(b\i) 
(when starting from one good species). 



6 The reconstructed trees of good species and of representative 
species 

6.1 The reconstructed tree of good species 

We show how to use the contour process to prove that the reconstructed tree of good species 
is a coalescent point process. We first make two observations. 

First, by the monotonicity property of the coalescent point process, the coalescence time 
between two good species i and j is the maximum of coalescence times of all consecutive pairs 
of species numbered i, i + 1, . . . ,j. So if we can infer from the contour process which extant 
species are good and which extant species are incipient, we will be able to characterize the 
reconstructed tree of the good species. This the goal of our second observation, for which we 
need some notation. 

For each species extant at T, we call A its age at time T and U its age when it turns good 
(see Figure [5]). If U > A, then the species is still incipient at T, otherwise it is a good extant 
species. In terms of the contour process, an extant species corresponds to a jump starting below 
T and ending above T (before truncation). The age A of the corresponding extant species is 
called the 'undershoot' of this jump. Because the triple of the depth H of the excursion, of the 
age A of the species by which the excursion ends and of its age at maturity U, is a function of 
the same excursion, all the triples (H, A, U) running over all extant species, are independent 
and identically distributed. 
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Figure 5: An excursion of the contour process away from T, showing its depth H, the un- 
dershoot A of its terminating jump, which is the age of the corresponding extant species (the 
species whose lifespan traverses time T), the age U at which it turned good, and the time V is 
survived after turning good. In this example, this extant species is a good species at time T, 
because U < A. 



Excursions satisfying U < A correspond to good species. Between two consecutive such 
excursions, we have a (geometric) number of excursions satisfying U < A, and we have to take 
the maximum of their depths H to get the coalescence time between the two consecutive good 
species. If we denote by H 9 the associated random variable, we get 

oo 

P(H 9 <y) = Y,P(H <y,U> A) n P(H <y,U<A), 

n=0 

because H 9 < y if and only if all the depths of in-between excursions (terminating with a 
species which is still incipient at T) are smaller than y. This can be recorded in the following 
statement. 

Proposition 6.1 Conditional on at least one good species extant at time T, the reconstructed 
tree spanned by extant good species is a coalescent point process. Its associated coalescent 
distribution is characterized by 

P(H 9 < „) = P(H<V,U<A) 

<y) i-p( H <y,u> a)- 1 yj 

Recall from ([3]) and (|5|) that the knowledge of the coalescent distribution is sufficient to compute 
the likelihood of the reconstructed tree of extant good species under the protracted speciation 
model. We now show how to perform this computation, in the same vein as in the previous 
section. 



6.2 Computation of the coalescent distribution for the tree spanned by good 
extant species 

The following statement shows how to recover the law of H 9 in the homogeneous model. 
Recall that the inverse W 9 of the tail of the coalescent distribution of good species is defined 
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by W 9 (y) = 1/P(H 9 > y) and that the inverse W of the tail of the coalescent distribution 
associated with the tree spanned by all extant species can be recovered by inverting the Laplace 
transform (JS]). 

Proposition 6.2 In the homogeneous model, the function W 9 is given by 

W 9 {y) = W{y)-b P ' W(y - x) P(U > x) dx. (20) 
Jo 

Note that the obvious inequality W g < W implies that P(H 9 > y) > P(H > y) for all y, 
confirming that the node depths of the good species tree are larger than the node depths of 
the reconstructed tree of all extant species. 

Proof. We use the following distributional equation (see e.g. |Kyprianou, 2006 Lambert and Trapman, 2013| ) 
characterizing the joint law of the depth H of an excursion, of the size D of the jump termi- 
nating this excursion, and of the undershoot A of this jump, in terms of the law of the total 
species lifetime A = U + V 

W(v - x) 

P(H <y,Ae dx,D e dz) = b — yy , - ' dx P(A G dz) < x < mm(y, z). 

W{y) 

Then we get 

f'tj poo 

P(H < y,U < A) = P(H <y,A£ dx,U < x,D £ dz) 

J x=0 J z=x 
py /"oo 

= P{H <y,Ae dx,D £dz)P(U <x\U + V = z) 

J x=Q J z=x 

ry roc yyr _ \ 

= b / r 1 P(A G dz) P(U < x\U + V = z)dx 

J x=0 J z=x W(y) 

= b r r W ri7? p(u <x,u+v e dz) dx 

J x=0 J z=x W{y) 

rv w( v _ x ) 

= b / i \ p (U < x,U + V > x)dx. 

J x=0 W{y) 

Similarly, we obtain 

P(H <y,U> A) = b F W %~ X) P(U>x)dx. 
Jo W{y) 

Equation (|20p then simply stems from plugging the last two equalities into (|19p . □ 

For the Markov model, we can repeat the same argument as that given in the previous 
section, to get 

P(HS>y) = ew(-J"dtb(t)(l-]fi(t))\ y > 0, (21) 

where p\{t) is the probability that a species in the incipient stage at absolute time T — t has 
no good descending species extant at absolute time T. Again, we do not have a closed formula 
for p 9 , but the pair (pf,pf) satisfies the following system of Kolmogorov differential equations, 
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where p 9 2 (t) is the probability that a species in the good stage at absolute time T — t has no 
good descending species extant at absolute time T. 



■9 

Pi 
■<-! 



-(yU 2 + fe)p|+)U2 + bPlP| ; 



(22) 



with initial condition p\ (0) = and p^O) = 1- Note that these boundary values are the only 
differences between the previous system (|22"j) . satisfied by the probabilities (pfjpf)' ano ^ the 
system (fTUj) satisfied by the extinction probabilities (qi,^)- 

We now turn to the constant rate model. In this case, we can provide a closed formula 
for the coalescent distribution. Recall the polynomial Q from the previous section and its two 
distinct real roots a < (3. 



Proposition 6.3 When /^(^l — b) — b\i 7^ 0, we have (3 ^ and then 

ap ap(p — a) 

When iiiiy\ — b) — bX± = 0, we have (3 = 0, a = b — v\ — ^2, and 

W 9 {y) = 1 + ^ (e * - 1 - ay) y > 0. 



(23) 



(24) 



Proof. We first use the method of the homogeneous model. In full generality, we can always 
define the function F as the Laplace transform of the non-negative function W 9 , that is, 

F(s):= / dye- sy W 9 (y)= / dye~ sy W(y)-b dxW(y-x)P(U>x 
Jo Jo L Jo 



By 



and an integration by parts, we get 

1 b 



F(s) 



■00) 1p(s) 

1 b 



/•oo 

/ dye- s yp(U>y) 
Jo 

III' 00 

/ dy e~ sy P(U S dy) 

s s J 



ifj(s) ij)(s) 

1 s-b + bE{e- sU ) 
s s-b + bE{e-< u+v y) 

If we are able to invert this Laplace transform, we get a closed form for W 9 , and hence for the 
tail distribution of H 9 . We have already computed the Laplace transform of U + V in (|15|) . 
and trivially E(e~ sU ) = v\j(y\ + s). Plugging these formulae into the general expression for 
the function F yields 



F(s) 



(s + vi- b)(s + fj, 2 ) 
sQ{s) 



co + _ 

s s 



Cl 



+ 



C2 



O 
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as soon as f3 ^ 0. Elementary calculus yields 



H2{vi — b) (a + v\ — b)(a + /X2) bXi f/3 + v\ — b) (ft + /i 2 ) &Ai 

c = 3 1 c l = 7 7T\ = —< m) c 2 - 



a/3 ' i_ a(a-/3) a(a-/3)' 0{p - a) 0{p - a) 

This allows us to invert the Laplace transform of W 9 as done in the previous section to get 
W 9 (y) = Co + c\e ay + C2e@ y , which is the announced expression (|23p . Similar calculations can 
be done in the case when f3 = to get (|24p . Note that we could have as well used the expression 
of W computed in (fT3|) and {HI), and plugged them into ([20]) . 



Similarly as for the reconstructed tree of all extant species, we can also apply the method 
used in the Markov model. Indeed, because 

W 9 {y) = exp (b J* dt (1 - p 9 (t)) 

then 

g _ -, W 9 ' 

Pl ~ bW 9 ' 

and it is easily seen that W 9 ' solves the same second-order, linear differential equation as W. 
The solving details are omitted. □ 



6.3 The reconstructed tree of representative species 

We now deal with the case of representative species. Recall from Definition 12.11 the ultimogen- 
iture order defined on the set of species born before time T, where species a is smaller than 
species b if their only respective ancestor species a' and b' which were sisters verify that a' is 
younger than b' . Also recall from Definition 12.21 that a representative species is either a good 
extant species or an incipient extant species which is the first extant descendant of some extinct 
good species. We want to show that we can again use the contour technique to characterize 
the reconstructed tree of representative species. Specifically, to ensure that the reconstructed 
tree of representative species is a coalescent point process, we have to prove that the event that 
an extant species is representative only depends on the excursion of the contour process that 
precedes its visit. 

Let u denote some extant species. In the previous subsection, we have argued that the 
event that an extant species is good or incipient depends only on the corresponding excursion 
of the contour process. Roughly speaking, an extant species is good iff the last jump of the 
corresponding excursion has a big enough undershoot A, that is, has U < A. Now we claim 
that u is representative if there is at least one of its ancestor species first visited during the 
corresponding excursion, say a, which is good. Indeed, if this last event occurs, then species u 
is representative by definition, because it is the first extant descendant of all species a satisfying 
this property, and so is representative of the most recent one among them. Conversely, if u 
is representative, then its most recent good ancestor species, say a (the species it represents), 
must be visited for the first time during the excursion. If this was not the case, then a would 
have another extant descending species previously visited by the contour process, and by defi- 
nition of the contour process, this species would be smaller than u. Then u would not be the 
smallest extant descending species of a, which contradicts the fact that u represents a. 
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Now we call o~(u) the mother species of u, o~ 2 (u) its grandmother species, and so on. We 
also set J(u) the maximum integer k such that a k (u) was visited for the first time during the 
corresponding excursion. We call Aq the age of u at time T, and for i > 1, we call Ai the 
age at which o~' L (u) gave birth to <7* -1 (ii) and Ui the age at which it turns good. Then u is 
a representative species iff there is < j < J(u) such that £/j < Aj. In terms of the contour 
process, one can detect if an extant species is a representative species if at least one jump 
of the future infimum of the corresponding excursion has a big enough undershoot, that is, 
has U < A. We can express this in the following statement, which is the exact analogue of 
Proposition 16.11 



Proposition 6.4 Conditional on at least one representative species extant at time T, the re- 
constructed tree spanned by extant representative species is a coalescent point process. Its 
associated coalescent distribution is characterized by 

p{H r <y) _ P(H<y, 30 < i < J(u),Ui< A^ 



1 - P{H < y, V0 < i < J{u),Ui < Ai) 

Unfortunately, we were not able to make a further characterization of the coalescent distri- 
bution in the homogeneous model, as was done in Proposition 16.21 for good species, so we now 
turn to the Markov model. 



Applying the same arguments as in the last two sections we see that the coalescence time 
between two consecutive representative species is greater than y if and only if all subtrees that 
have branched off the lineage of the first one at absolute times belonging in (T — y,T), do not 
have any good descending species that has extant descending species by time T. Therefore 

P(H r >y) = e^{- f\tb(t)(l-p\{t))^ y>0, (25) 

where Pi(t) is the probability that a species in the incipient stage at absolute time T — t does 
not have any good descending species that has extant descending species at absolute time T. 

Again, the problem moves to characterizing the function p\ and is solved by observing that 
p\ is solution to the following Kolmogorov differential equation, with initial condition p[{0) = 1, 

p\ =-(>! + b)p\ + Aig 2 + m + b(p[) 2 , (26) 

where we remind the reader that (t) is the probability that a species in the good stage at 
absolute time T — t has no descending species at absolute time T. Recall that in the constant 
rate model, this extinction probability is given by Corollary 15 .21 Otherwise, it can be computed 
thanks to the two differential equations (jlOp . so that the coalescent distribution stems from 
solving a set of 3 differential equations. Recall that the method proposed in Section 3 formally 
required solving an infinite number of coupled ODEs and does not allow differences in extinction 
rates between good and incipient species. 



7 Extensions 

7.1 More stages of incipientness 

Here, we want to extend the Markov model to a model where species can have a fixed number, 
say I — 1, of stages of incipientness, before turning good. Namely, we assume that newborn 
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species start in state 1 as the first stage of incipientness, 2 the second stage, until they go 
through stage 1 — 1, and finally stage /, which is the 'good' stage. Assume again that re- 
gardless of species status, species give birth (speciate) at the same rate b. More specifically, 
Xj is the rate at which a species of type j becomes type j + 1 (1 < j < I) and fij is the 
extinction rate of species of type j (1 < j < I). The notation is chosen to be consistent 
with the constant rate model, which can be obtained by taking 1 = 2. Actually, we will now 
see that this model is a particular case of the homogeneous model treated throughout the paper. 

Indeed, this model can also be expressed in terms of the durations Vx, ■ ■ ■ , Vj of successive 
stages. Start with independent random variables U±, . . . , Uj, where Uj is an exponential random 
variable with parameter Vj := Xj + fij if 1 < j < I — 1, and Vj = fij if j = I. Also let 
£\, . . . ,£i-i be independent random variables, where £j is a Bernoulli random variable with 
success probability Xj/vj. Set 

N := min{l < j < I - 1 : sj = 0}, 

which is set to / if this last set is empty. Then we can define Vj := Uj if j < N and Vj := 
otherwise, as the stage durations of a typical species. More specifically, the species terminates 
its lifetime in state N, its total lifetime duration is U\ + • • • + Un, and it is in stage j at age 
t if Vi + • • • + Vj-x <t<V\ + -- - + Vj,l<j<N. In particular, the species turns good iff 
N = I. Actually, this model is a particular case of the homogeneous model, if we set 

U ■-V x + --- + V I -x and V = Vj, 

so that we can apply results specifically pertaining to the homogeneous model (equation flSJ) 
and Proposition 16. 2p to this new model. This involves inverting the Laplace transform ([8]) and 
compute the distribution function of XJ. We leave the details to the interested reader. Because 
we do not have specific results in the homogeneous model for the case of representative species, 
we now explain how to adapt the arguments expanded in the case I = 2 to the case I > 2. 
Similar reasoning leads to an alternative route as that proposed previously for the treatment 
of reconstructed trees spanned by all extant species or by good extant species. This route is 
detailed explicitly in the next subsection, in the more general setting where some extant species 
can be missing. 

Assume again that an extant species is representative iff it is the first extant descendant of 
some good species. Then equation ([25]) still holds, namely 

P{H r >y) = exp (- jT dt b(t) (1 - p\ (i))) y > 0, (27) 

where Px(t) is the probability that a species in stage 1 at absolute time T — t has no good 
descending species that have extant descending species at absolute time T. The problem is now 
to characterize the function p\. Similarly as in the previous section, the functions p^ satisfy 
the following differential equations, where Pj(t) is the probability that a species in stage j 
at absolute time T — t has no good descending species that have extant descending species at 
absolute time T. For any 1 < j < I — 1, 

fj = - (yj + b)p r j + Xjp r j+1 + Uj + bp r lP r j, (28) 
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with initial condition Pj(0) = 1, and where p\ = qi is the probability that a species in the good 
stage at absolute time T — t has no extant descending species at absolute time T. 

To compute this extinction probability qi, we let qj(t) be the probability that a species 
in stage j at absolute time T — t has no descending species at absolute time T, so that the 
functions qj satisfy the following differential equations. For all 1 < j < I — 1, 

9j = -( v j + b)qj + ^jQj+i + fJ-j + hiQj, (29) 

and for j = I, 

qi = -(w + b)qi + fii + bqtqi, (30) 

with initial conditions qj(0) = for all 1 < j < I, which are the analogues to equations (|10p 
satisfied by extinction probabilities in the case 1 = 2. To sum up, there are 21 — 1 differential 
equations to solve in order to get the coalescent distribution (|27p . First, one has to solve the 
previous system of I differential equations to compute the extinction probabilities qj , and then 
plug qi = p r j into the system (|28p of I — 1 differential equations to get p\ . 



7.2 Missing species 

In this subsection, we complete the calculations made in Subsection 14.1 .31 for trees with missing 
species under a binomial sampling scheme, when sampling probability depends on species 
status. We will treat this case only in the Markov model, but for the sake of completeness, we 
will consider the full generality of I stages of incipientness, as in the previous subsection. 

From now on, we define pi as the probability of being sampled at T for an extant species in 
stage i. We wish to compute the likelihood of the reconstructed tree of all sampled species or 
of sampled representative species. Notice that the reconstructed tree of (sampled or not) good 
species can be seen as the reconstructed tree of all sampled species in the special case when 
Pi = as soon as i ^ I (recall that I is the stage of good species). 

The reconstructed tree of all sampled species (resp. of all representative species) is again 
a coalescent point process, and the common density / (resp. f r ) of its typical node depth H 
(resp. H T ) satisfies the same ordinary differential equations as previously, but with different 
initial conditions. Let us give a conclusive, self-contained summary of these results. 

We first modify slightly the definitions of the quantities qj(t) and Pj(t). We now let qj(t) 
stand for the probability that a species in stage j at absolute time T — t has no descending 
species sampled at absolute time T. The functions qj still satisfy the differential equations (|29p 
and (|30p . but with initial conditions ^ (0) = 1 — pi for all 1 < j < I. Recall that it is possible, 
for example, to recover the probability p 9 - that a species in stage j at absolute time T — t has 
no good descending species at absolute time T, by taking pj = 1 if i ^ I and pi = 0. 

Now we let Pj(t) be the probability that a species in stage j at absolute time T — t has 
no good descending species that have extant descending species sampled at absolute time T. 
Then the functions p^ still satisfy the differential equations (|28p . again with p r j = qi, and with 
the same initial conditions p^(0) = 1. 

Now similarly as in Section [U we set g(y) := P(H > y) and g r (y) '■= P(H r > y), so that 
/ = —g and f r = —g r . It is easy to see that 

P(H >y)= exp (- ^ dt b(t) (1 - qi (t))\ y>0, 
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and that 

P(H r >y) = exp (- ^ dt b(t) (1 - pl(t))^ y>0, 

so that 

9 = ~b(l - q\)g and g r = -6(1 - p\)g. 

Each of the previous two equations can be solved simultaneously with those satisfied by the 
probabilities (%) and/or (pj). Then use / = — g = 6(1 — q\)g and f r = —g r = 6(1 —p\)g. 
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